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Description 
SYSTEM AND METHOD FOR INTERNET 
BROADCASTING OF MPEG-4-BASED STEREOSCOPIC 

VIDEO 
Technical Field 

[1] The present invention relates to a Web broadcasting system and method; and, nore 

particularly, to a system and method for broadcasting a stereoscopic video to usere on 
ttie Intemet based on Moving Picture Experts Group (MPEG)-4, 

Background Art 

[2] 'A Stereoscopic video' means a moving picture that is produced by receiving and 

outputting left-eye data and right-eye data alternately to give three-dimensional far and 
near distance effect to two-dimensional planes. 

[3] Along with the recent development of the Intemet, diverse miltimedia data in a 

field of education, culture, cunent issues and the like are provided to Intemet users. 
Intemet users can watch and/or Usten to miltimedia data they want at any time at any 
place as long as they have clients connected to the Intemet. 

[4] Generally, Intemet broadcasting systems, which are also referred to as Intemet 

broadcasting systems, are formed of an encoding server for encoding nultimedia data 
based on a predetemined encoding method, a streaming server for transnitting the 
miltimedia stream, and clients for decoding and outputting the transmitted miltimedia 
stream . 

[5] Hg. 1 is a block diagram illustrating a typical Intemet broadcasting system. As 

shown, video data and audio data are inputted from a video/audio input device 10, such 
as a video camera, and compressed as they pass through an encoding server 20. 

[6] The MPEG is a group of moving picture experts that is formed to establish the 

standards for moving picture encoding methods. The MPEG studies about ncving 
picture compression that varies continuously based on time and about the transmission 
of coded data. The MPEG suggests international encoding standards and current 
Intemet broadcasting is performed based on the standards. Particularly, MPEG-1 and 
MPEG 2 are intemational standards that are used for compressing and storing large 
volume miltimedia data. 

[7] A streaming server 30 transmits the miltimedia stream, which is encoded by the 

encoding server 20, to clients 50 through the Intemet 40. Then, the clients 50 decode 
the transmitted miltimedia stream. The clients 50 should have a player with a codec to 
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[8] 



[9] 



[10] 



output the iiultixnedia data. 

In the meantime, some problems may occur when the stereoscopic video data are 
transmitted using conventional encoding methods and current Internet broadcasting 
system. Since left-eye images and right-eye images should be encoded separately to 
transmit stereoscopic video data to the clients through the Internet, the airount of data 
is increased nure than twice and the probability of transnission error becomes higher 
due to the load of transmission traffic. Moreover, there is a problem that the clients 
should discriminate between the left-eye images and the right-eye images in order to 
decode them and output them synchronized with each other temporally. If the left-eye 
images and the right-eye images are not outputted alternately, a three-dimensional 
effect cannot be obtained, only to cause eye-fatigue of viewers. 

Therefore, a new encoding method, other than conventional encoding methods, is 
reqiired to broadcast stereoscopic video data on the Internet as well as an Internet 
broadcasting system and method coinciding with the encoding method. 
Disclosure of Invention 

Technical Solution 

It is, therefore, an object of the present invention to provide a system and method 
for broadcasting stereoscopic video data on the Internet by encoding and nultiplexing 
nultimedia data based on a structure of Moving Picture Experts Group-4 (MPEG-4) 
temporal scalability (TS). 
[11] It is another object of the present invention to provide an Internet broadcasting 

system and method that can broadcast conventional two-dimensional video data on the 
Internet. 

[12] In accordance with one aspect of this invention, there is provided a system for 

broadcasting stereoscopic video data to a client on the Internet, including: an encoding 
server for encoding stereoscopic video data, audio data, and Object Descriptor/Binary 
Ibrmat for Scene (OD/BIFS). which is information for controlling a content, and 
encoding the data into elementary stream (ES) having an MPEG-4 structure; a web 
server for receiving from the client any one ant)ng two-dimensional video display 
mode, field-shuttering video display nude and frame-shuttering video display node; 
and a streaming server for generating a real-time transport protocol (RTP) packet for 
real-time data transnission on the Internet by nultiplexing the ES based on the display 
mode inputted into the web server, and transmitting the RTP packet to the client. 

[13] In accordance with one aspect of the present invention, there is provided a method 

for broadcasting stereoscopic video data to a client on the Internet based on MPEG-4, 
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including the steps of: a) receiving stereoscopic video data, audio data, and OD/BIES 
data, which is information for controUing a content, and encoding the data into ES 
having an MPEG-4 structure; b) receiving any one among two-dimensional video 
display mode, field-shuttering video display mxie and frame-shuttering video display 
mode from the cUent; and c) generating an RTP packet for real-time transmission on 
the Internet by nultiplexing the ES based on the inputted display mxie, and 
transnitting the RTP packet to the client. 

Description of Drawings 

[14] The above and other objects and features of the present invention will become 

apparent from the foUowing description of the preferred embodiments given in 
conjunction with the accompanying drawings, in which: 

[15] Hg. 1 is a block diagram illustrating a typical Internet broadcasting system; 

[16] Hg. 2 is a block diagram depicting an Internet broadcasting system in accordance 

with a preferred embodiment of the present invention; 

[17] Hg. 3 is a block diagram showing an encoding server of Hg, 2 in detail; 

[ 1 8] Hg. 4 is a block diagram showing an encoder of Hg. 3 in detail; 

[19] Hg. 5 is a diagram showing a video data inputted into each layer of a Moving 

Picture Experts Group-4 (MPEG-4) structure in accordance with the prefeired 
embodiment of the present invention; 

[20] Hg. 6 is a block diagram illustrating an MPEG 4 (MP4) file generator of Eg. 3 in 

detail; 

[2 1] Hgs. 7 and 8 are diagrams describing arrangements of elementary stream (ES) of an 

MP4 file; 

[22] Hg. 9 is a block diagram illustrating a streaming server of Hg. 2 in detail; and 

[23] Hg. 10 is a diagram depicting a packing transformation process in the streaming 

server. 

Mode for Invention 

[24] Other objects and aspects of the invention will become apparent from the following 

description of the embodiments with reference to the accompanying drawings, which 
is set forth hereinafter. The terms and words used in the present specification and 
claims should not be construed as conventional or dictionary meaning, but they should 
be construed as concepts and meanings fit in with the technological concept of the 
present invention based on a principle that inventors could define the concept of terms 
property to describe the invention most appropriately. Accordingly, the embodiment 
and drawings of the present specification are no mjre than one of the preferred em- 
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bodiments and do not represent aU the technological concept of the present invention. 
In the respect, there may be various eqiivalents and rrodifications that can replace the 
elements illustrated in the specification as of the filing of the present patent ap- 
plication. 

[25] Hg. 2 is a block diagram depicting an Internet broadcasting system in accordance 

with a preferred embodiment of the present invention. As shown, miltimedia data (i,e.. 
stereoscopic video data or audio data) or content-contioUing Object Descriptor/Binary' 
format for Scenes (OD/BIES) data obtained ftom a stereoscopic video camera or a 
video/audio input device 100. such as video tape recorder (VTR). are inputted into an 
encoding server 200. Then, the encoding server 200 encodes the inputted signals based 
on Moving Picture Experts C5roup-4 (MPEG-4). An elementary stream (ES) obtained 
by encoding the signals in the encoding server 200 is transiritted to a streaming server 
300. 

[26] To encode the stereoscopic video, the present invention uses an MPEG-4 temporal 

scalabiUty (TS). MPEG-4 TS is a structure where inputted left-eye images are 
allocated to a base layer and right-eye images are allocated to the enhancement layer. 
The left-eye images allocated to the base layer are encoded based on the conventional 
two-dimensional video encoding. The right-eye images allocated to the enhancement 
layer are encoded with reference to the image of the base layer, which is overlapped 
with that of the environment layer. 

[27] MeanwhUe, a web server 400 receives information on contents and a display nude 

recpested by a cUent 600 through a back channel and transmits them to the streaning 
server 300. The streaming server 300 nultiplexes the ES of the content in the display 
mode reqiested by the client 400 to generate miltimedia data, e.g., a real-time 
transport protocol (RTP) packet, and transnits the miltimedia data to the client 600 
flirough die Internet. The client 600 decodes and displays the data in the transmitted 
order. To output miltimedia data, the client 600 should have a player with a codec 
necessarily. 

[28] Hg. 3 is a block diagram showing an encoding server of Hg. 2. As shown, the 

encoding server 200 includes an encoder 210, an encoding parameter unit 220, an a 
MPEG layer 4 (MP4) file generator 230 for generating an MP4 file by using encoded 
ES. and a storage 240 for storing the MP4 file. 

[29] The encoding parameter unit 220 provides information for encoding flie inputted 

stereoscopic video. It sets up parameters for encoding, such as a size of an image, the 
number of frames to be encoded, a frame rate, a size of notion search, a ti-ansmission 
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[30] 



bit rate, and an initial qiantization coefficient, and inputs them to the encoder 210. 

The encoder 210 encodes the inputted stereoscopic video data and audio data based 
on the MPEG-4 TS and audio codec. Internal modules of the encoder 210 are U- 
lustrated in Hg. 4. 

[31] Referring to Hg. 4, the encoder 210 includes a video encoding module 212 for 

encoding stereoscopic video data, an Elementary Stream Interface (ESI) information 
generating module 216, an audio encoding module 218 for encoding audio data, and an 
OD/BnS encoding module 219 for encoding OD/BIES data. 

[32] The OD/BIRS encoding module 219 encodes binary format for scene (BIES) for 

describing audio and scenes and object descriptor (OD) for defining the relationship 
between media streams. 

[33] The ESI information generating module 216 generates additional information 

needed for the transmission and decoding of ES, such as a data length of ES, an idle 
flag, and a length of access unit (AU), which are included in a header information of a 
synchronization layer (SL). The header information of SL will be described later. 

[34] The video encoding module 212 further includes a field separating nixiule 213, a 

base layer encoding module 214, and an enhancement layer encoding nxwiule 215. The 
field separating module 213 separates a stereoscopic three-dimensional video data into 
a left-eye odd field, a left-eye even field, a right-eye odd field, and a right-eye even 
field. The base layer encoding module 214 encodes the left-eye odd field, and the en- 
hancement layer encoding module 215 encodes the left-eye even field, right-eye odd 
field and right-eye even field. 

[35] Hg. 5 is a diagram showing fields separated by the field separating nxKiule being 

inputted into each layer of an MPEG-4 structure in accordance with the preferred 
embodiment of the present invention. As shown, the left-eye odd field is inputted into 
the base layer, the left-eye even field into a first enhancement layer; the right-eye odd 
field into a second enhancement layer; and the right-eye even field into a third en- 
hancement layer. 

[36] Hg. 6 is a block diagram illustrating an MP4 file generator of Hg. 3. As shown, the 

MP4 file generator 230 which receives video/audio ES, OD/BIES ES and ESI in- 
formation from the encoder 210 includes a media data providing raxiule 232, a 
metadata providing module 234 and an MP4 file generating nrodule 236. 

[37] The media data providing module 232 is a buffer for receiving video ES, audio ES 

and OD/BIES ES, which are encoded on a field-by-field basis. It transmits the ES as to 
the MP4 file generating module 236. 
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[38] The metadata providing module 234 is a buffer for receiving ESI information 

transmitted from the encoder 210, and transmitting the ESI information as a metadata 
to the MP4 fde generating module 236. 

m The MP4 file generating module 236 converts the inputted ES and the metadata into 

an MP4 file format. This is to generate and store a file of a format suitably for 
transmission by receiving ES outputted from the encoder and additional information 
for the ES, extracting ES in coincidence witii Uie display nude ret^iested by a user. 

[40] An MP4 file has two zones: One is a metadata zone for storing file information, and 

flie oUier is an mdata Atom zone for storing ES. The ES stored in tiie mdata Atom zone 
is given a proper ED identification ES_ID to discrininate encoded ES. 

[41] Hgs. 7 is an exemplary diagram illustrating an arrangement of ES in tiie mdata 

Atom for storing die media data, the ES being given four ES_ID based on flie right and 
left odd and even fields. Hg. 8 is an exemplary diagram iUusfrating an arrangement of 
ES for stereoscopic video data in the mdata Atom by miltiplexing four fields of tiie 
ES. The ES is inputted on a four-field basis, i.e.. a left-eye odd field, a right-eye even 
field, a left-eye even field and a right-eye odd field. One ES.ID is allocated to four 
fields having the same time information. 

[42] The MP4 file generated tiuough tiie above processes is stored in a storage 240 and 

extracted by the streaming server 300. 

[43] Hg. 9 is a block diagram illustrating a stteaning server of Hg. 2. As shown, the 

streaming server 300 extracts MP4 files stored in tiie storage 240, or receives Encoded 
ES and ESI information encoded by flie encoder 210, generates a real-time transport 
protocol (RTP) packet ttiat coincides witii a user's re<jiest, and transmits it to a client 
600. 

[44] In order to generate tiie RTP packet tiiat coincides with the user's request, a display 

mode reqiested by tfie user should be inputted into tiie streaming server 300. Ac- 
cordingly, tiie display mode reqiested by tiie user should be inputted from flie cUent 
600 and a web server 400 and tiien transmitted to a streaning server 300. 

[45] In tiie Internet broadcasting system of ttie present invention, a video data is encoded 

after divided into a left-eye odd field, a left-eye even field, a right-eye odd field and a 
right-eye even field. Therefore, conventional two-dimensional video data, field- 
shuttering tiiree-dimensional video data or frame-shuttering tiiree-dimensional video 
data can be all processed in tiiis system. 

[46] Rr example, if a user wants tiie conventional two-dimensional video display, flie 

streaming server 300 transmits a stream of tiie left-eye odd field and tiie left-eye even 
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field. If the user wants field-shuttering three-dimensional video display, it extracts and 
transnits a stream of the left-eye odd field and the right-eye even field. Likewise, if 
the user wants frame-shuttering three-dimensional display, it transnits a stream of all 
of the four fields. 

[47] If the user's reqiest on the display mode is inputted into a MP4 file analyzing 

module 310 through the web server 400. the MP4 file analyzing nodule 310 extracts a 
needed AU stream and ESI information from the MP4 files stored in the storage 240. 
Here, the MP4 file analyzing module 310 can receive the AU stream and the ESI in- 
formation from the encoder 210 in real-time. 
[48] When the MP4 file analyzing module 310 extracts the AU stream and the ESI in- 

formation based on the reqiest of the user, a SL packet generating mxiule 320 
generates an SL packet having a header and a payload for the extracted AU stream. 
The header of the SL packet is synchronization information for each packet and it is 
used to check continuity when data loss occurs. The header includes information for 
controlling time synchronization, such as time stamp. The payload of tiie SL packet is 
valid information that comes after flie header. The payload includes the AU stream 
extracted by the MP4 file analyzing mxiule 310. 
[49] The generated SL packet is inputted into a FlexMux packet generating nt)dule 330 

FlexMux. and the HexMux packet generating module 330 generates a FlexMux packet 
by adding a header that defines a packet type to the SL packet. The packet type means 
information for distinguishing video data from audio data. 
[50] The generated HexMux packet is inputted into an RTF packet generating mxiule 

340. Then, tiie RTF packet generating module 340 generates an RTF packet tiiat could 
be transmitted through the Internet in real-time. 
[51] The RTF packet is a protocol packet of a transport layer that makes it possible to 

transnit data on the Internet in real-time. The RTF packet can be generated by adding 
a header including information for real-time data transnission to a FlexMux packet. 
[52] Hg. 10 is a diagram depicting a packing transformation process in the streaming 

server. The RTF packet generated in flie above is transmitted to a client 600 through 
the Internet in real-time, and a player mounted on the cUent 600 decodes the RTF 
packet and displays it. 

[53] If the packet is a field-shuttering three-dimensional video RTF packet, the player 

can produce tfiree-dimensional distance effect by outputting a stream of left-eye odd 
field and a stream of right-eye even field in the transnitted order, instead of dis- 
criminating between left-eye odd field stream and right-eye even field stream and syn- 



wo 2004/093459 



8 



PCT/KR2004/000871 



chronizing their time with each other and output them. In short, since the RTP packet 
mihiplexed by the streaming server 300 is packetized in the order of necessary field 
streams based on the display mode reqiested by the user, the client 600 can output 
stereoscopic video data without an additional data processing. 

[54] The Internet broadcasting system and method of the present invention can reduce 

the amount of data considerably by encoding a stereoscopic video data effectively, 
thus reducing the probability of transmission error occurrence. Therefore, it is possible 
to broadcast stereoscopic videos on the Internet in real-time. 

[55] In addition, the Internet broadcasting system of the present invention can restore not 

only stereoscopic videos but also conventional two-dimensional videos based on the 
display mode reqiested by the user. 

[56] WhUe the present invention has been described with respect to certain preferred 

embodiments, it will be apparent to those skiUed in the art that various changes and 
modifications may be made without departing from the scope of the invention as 
defined in the following claims. 



